--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-06-02 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 82% (64%-97%) 15,750 265
Albania 87% (45%-100%) 1,143 33
Algeria 33% (23%-46%) 9,513 661
Andorra 23% (11%-44%) 765 51
Argentina 50% (38%-63%) 17,402 556
Armenia 60% (43%-79%) 9,492 139
Australia 87% (56%-100%) 7,204 103
Austria 26% (13%-46%) 16,663 668
Azerbaijan 82% (58%-100%) 5,662 68
Bahamas 59% (17%-100%) 102 11
Bahrain 98% (90%-100%) 11,804 19
Bangladesh 78% (60%-100%) 49,534 672
Belarus 99% (91%-100%) 43,403 240
Belgium 27% (21%-33%) 58,517 9,486
Bolivia 33% (25%-42%) 10,531 343
Bosnia and Herzegovina 35% (16%-66%) 2,523 153
Brazil 26% (21%-31%) 526,447 29,937
Bulgaria 23% (16%-32%) 2,513 144
Burkina Faso 42% (20%-81%) 881 53
Cameroon 46% (25%-87%) 6,397 199
Canada 14% (11%-17%) 91,694 7,326
Chad 51% (27%-81%) 790 66
Chile 82% (66%-98%) 105,159 1,113
China 91% (26%-100%) 84,154 4,638
Colombia 36% (28%-45%) 30,493 969
Congo 50% (25%-90%) 611 20
Costa Rica 78% (40%-100%) 1,084 10
Cote dIvoire 90% (67%-100%) 2,951 33
Croatia 24% (12%-45%) 2,246 103
Cuba 65% (34%-97%) 2,083 83
Cyprus 85% (52%-100%) 949 17
Czechia 44% (31%-64%) 9,302 321
Democratic Republic of the Congo 84% (55%-100%) 3,194 72
Denmark 59% (39%-82%) 11,699 576
Djibouti 83% (55%-100%) 3,569 24
Dominican Republic 84% (67%-97%) 17,572 502
Ecuador 30% (24%-37%) 39,994 3,394
Egypt 27% (21%-34%) 26,384 1,005
El Salvador 62% (41%-91%) 2,582 46
Equatorial Guinea 83% (52%-100%) 1,306 12
Estonia 34% (20%-55%) 1,870 68
Finland 60% (33%-90%) 6,885 320
France 22% (17%-26%) 152,091 28,833
Gabon 94% (77%-100%) 2,655 17
Georgia 78% (41%-100%) 796 12
Germany 40% (31%-50%) 182,028 8,522
Ghana 98% (89%-100%) 8,070 36
Greece 25% (15%-39%) 2,917 175
Guatemala 36% (26%-48%) 5,336 116
Guernsey 46% (14%-96%) 252 13
Guinea 95% (80%-100%) 3,844 23
Guyana 46% (13%-97%) 153 12
Haiti 55% (31%-86%) 2,226 45
Honduras 43% (30%-59%) 5,362 217
Hungary 11% (7.3%-16%) 3,921 532
Iceland 87% (53%-100%) 1,806 10
India 35% (29%-42%) 198,706 5,598
Indonesia 21% (17%-26%) 26,940 1,641
Iran 51% (41%-62%) 154,445 7,878
Iraq 27% (19%-38%) 6,868 215
Ireland 24% (15%-35%) 25,062 1,650
Isle of Man 37% (9.6%-97%) 336 24
Israel 80% (57%-99%) 17,219 287
Italy 14% (12%-17%) 233,197 33,475
Japan 15% (11%-21%) 16,930 894
Jersey 17% (6.8%-46%) 308 29
Kazakhstan 98% (90%-100%) 11,571 41
Kenya 32% (20%-53%) 2,021 69
Kosovo 66% (37%-99%) 1,064 30
Kuwait 96% (84%-100%) 27,762 220
Kyrgyzstan 88% (62%-100%) 1,845 17
Latvia 49% (25%-87%) 1,071 24
Lebanon 81% (47%-100%) 1,233 27
Liberia 26% (9.7%-71%) 296 27
Lithuania 24% (11%-39%) 1,678 70
Luxembourg 51% (32%-74%) 4,019 110
Malaysia 97% (82%-100%) 7,857 115
Mali 20% (13%-28%) 1,315 78
Mauritius 62% (18%-100%) 335 10
Mexico 13% (10%-15%) 93,435 10,167
Moldova 30% (23%-39%) 8,360 307
Morocco 98% (83%-100%) 7,833 205
Netherlands 22% (17%-28%) 46,545 5,962
New Zealand 49% (21%-91%) 1,154 22
Nicaragua 81% (48%-100%) 759 35
Niger 19% (9.3%-35%) 958 65
Nigeria 44% (30%-60%) 10,578 299
North Macedonia 13% (7.3%-21%) 2,315 140
Norway 82% (50%-100%) 8,411 236
Oman 97% (88%-100%) 12,223 50
Pakistan 34% (27%-43%) 76,398 1,621
Panama 59% (43%-76%) 13,837 344
Paraguay 92% (68%-100%) 995 11
Peru 41% (31%-49%) 170,039 4,634
Philippines 40% (28%-53%) 18,638 960
Poland 56% (43%-70%) 24,165 1,074
Portugal 26% (19%-34%) 32,700 1,424
Puerto Rico 74% (50%-96%) 3,873 136
Qatar 90% (53%-100%) 58,433 40
Romania 25% (19%-32%) 19,398 1,279
Russia 37% (31%-45%) 423,741 5,037
San Marino 83% (35%-100%) 687 42
Saudi Arabia 87% (54%-100%) 87,142 525
Senegal 87% (64%-100%) 3,739 42
Serbia 94% (73%-100%) 11,430 244
Sierra Leone 40% (20%-69%) 861 46
Singapore 90% (56%-100%) 35,292 24
Sint Maarten 12% (4.1%-33%) 77 15
Slovakia 73% (43%-99%) 1,522 28
Slovenia 18% (9.9%-32%) 1,483 108
Somalia 40% (19%-70%) 2,023 79
South Africa 40% (32%-48%) 34,357 705
South Korea 60% (30%-96%) 11,541 272
Spain 9.4% (7.7%-11%) 239,638 27,940
Sudan 9.3% (6.9%-12%) 5,173 298
Sweden 19% (15%-24%) 37,814 4,403
Switzerland 30% (21%-41%) 30,788 1,656
Tajikistan 97% (86%-100%) 3,930 47
Thailand 78% (49%-100%) 3,082 57
Togo 75% (37%-100%) 443 13
Tunisia 52% (21%-98%) 1,084 48
Turkey 77% (62%-91%) 164,769 4,563
Ukraine 42% (31%-55%) 24,340 727
United Arab Emirates 99% (92%-100%) 35,192 266
United Kingdom 14% (11%-16%) 276,332 39,045
United Republic of Tanzania 57% (28%-96%) 509 21
United States of America 35% (29%-41%) 1,811,277 105,147
Uruguay 45% (22%-83%) 825 23
Uzbekistan 95% (80%-100%) 3,718 15
Venezuela 88% (61%-100%) 1,662 17
Yemen 3.2% (2.2%-4.5%) 354 84

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Adjusting case counts for under-reporting

We adjust the reported number of cases each day, pulled from the ECDC. Specifically, we divide the case numbers of each day by our “proportion of cases reported” estimates that we calculate each day for each country.*

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.